Interactive Charts Describing Street Trees in Vancouver, British Columbia, Canada, Using Open Data Portal

Batool Azim

May 2023

Final Project for Data Visualization Course, University of British Columbia

batool-photo.png

Introduction

We learned about the process of photosynthesis in science class at school and understand that plants take in Carbon Dioxide and expel Oxygen, opposite to the human species. This is one of the various reasons street trees can contribute to the health of a city. This project is aimed at tree enthusiasts who wish to see a variety of genera (which is the plural of genus) and to plan the neighbourhod to visit based on their desired criterion. This project discusses basic descriptive statistics of street trees in Vancouver, British Columbia, Canada, and the neighbourhoods they occur in.

Questions of Interest

I did a quick scan on the internet and found that the reported large trees in Vancouver are located in parks. This project will deliver unique results as this dataset only includes street trees—excluding private trees and park trees. I found a paper with map visuals of the density, diameter, and height distributions of trees in Vancouver, BC, conducted by Matasci et al. in 2018. The paper specifically showcases the trees that are above 15 meters in height, which is different from the trees in our dataset. Also, it fails to showcase the genera distribution by neighbourhood, which will be shown in this project (in Figure 3). A paper published by Galle et al. (2021) discusses the biodiversity in Vancouver, however, this project presents different information using interactive components and visuals. Read until the end to play with the interactive dashbroad that will give you a lot of information about Vancouver street trees. This project can be used in conjunction with the Matasci et al 2018 paper for tree enthusiasts visiting the city and for students in landscape architecture. The interactive components in this project show helpful information like the tree names and species name of outliers, using zooming, panning, and clicking.

The dataset used is a subset of the publicly available data, published through the City of Vancouver. I am using a subset—5,000 observations out of over 150,000 observations—created by UBC staff. The subset was chosen at random and may not be representative of the complete dataset found on the City of Vancouver website (UBC Canvas, 2023).

The project is on a Jupyter notebook using Python, Altair visualization, and Pandas library. The data are accessible on the City of Vancover website.

Analysis

This is a table of the description of the columns of interest:

Table 1. Column Names

Column Explanation Unit
DIAMETER DBH in inches (DBH stands for diameter of tree at breast height) Inches
HEIGHT_RANGE_ID 0-10 for every 10 feet (e.g., 0 = 0-10 ft, 1 = 10-20 ft, 2 = 20-30 ft, and 10 = 100+ ft) NA
NEIGHBOURHOOD_NAME City's defined local area in which the tree is located NA
GENUS_NAME Genus name NA
SPECIES_NAME Species name NA
COMMON_NAME Common name NA

The code of table structure is attributed to rayryeng through StackOverflow. The descriptions are copied from the dataset schema of the City of Vancouver website.

Number of null values and type of each column:

-

Date_planted and cultivar_name are missing more than 50% of the values. I ran an analysis using the null values: the majority of missing values have a diameter less than 30 inches, span from 1 to 9 id for height with the bulk from 2 to 4 id. Finally, to ensure there is no association between cultivar name and date planted, I created a stroke chart using the code in the cell below. I found no clear pattern between the missing values and other variables. The distribution of missing dates with diameter may be because the majority of trees in the dataset have a diameter below 30 inches.

I have no interest in cultivar_name. The main questions in the paper exclude age (derived from current year minus date_planted), because I am interested in including 5000 data points in the visualizations.

Description of numeric values:

-

Date Planted starts in 1989 and ends in 2019—a span of almost 30 years. The columns that relate to my main questions of interest are diameter and height_range_id, which represents the height range in feet, described in more detail in table 1. Diameter ranges from 0 to 71 inches. Height ranges from id 0 to id 9.

I ran a visual on the count distribution of trees planted and found that the majority of trees are from 1992 to 2014. There are less than 40 trees planted a year starting from 2015 upwards and 1991 downwards.

I found some information worth noting unrelated to the questions of interest by running repeat plots on numerical and categorical data, then plotting some variables manually to explore the data further. The majority of trees are between 8 and 30 years of age and between 20 and 30 ft in height. In the majority of trees there is a curb present, no root barrier, and no lot assigned. I ran a density plot on the diameter and found that the peak is around 5 inches, with the majority less than 30 inches in diameter. Also, I plotted a heatmap of genus by neighbourhood and found ACER, PRUNUS, TILIA, FRAXINUS, and QUERCUS—the top 5 genera as shown in table 2—present in every neighbourhood.

The columns that appeared to exhibit a correlation are listed below.

Government of BC looks like it uses a specific system for civic numbers, that relates to longitude, latitude, and on street block.

-

The below tables show the most recurring genera and species.

The most frequent genera in the dataset are ACER and PRUNUS at 1218 and 1050 trees respectively. The most frequent species respectively are PRUNUS SERRULATA, ACER PLATANOIDES, PRUNUS CERASIFERA, and ACER RUBRUM.

-

The first question I would like to answer is whether there is a relationship between height and diameter. The literature shows a positive relationship. The visual below explores this by using points for the individual observations in the data and a line connecting the medians of each height id. The chart explores other components like the neighbourhood, species, and date planted, by hovering over the point of interest. You can zoom and pan the chart by scrolling and dragging respectively. A double click on the white space takes you back to the default view. A click on an individual point takes you to another page with a search engine of the tree. The drop down list can show you the distribution of points and median diameter for a specific genus.

-

There is a positive relationship between diameter and height. There are less than 10 observations for height id nine: the median diameter drop from eight to nine could be attributed to small data that fail to capture a true representation. The median diameter is 35 inches for trees between 80 and 90 feet. The thickest tree is in Kitsilano and is CEDRUS DEODARA.

-

The next question I'd like to answer is which neighbourhoods have a variety of diameters. The chart below answers this question. The diameters are binned by 10 inches and the size of the points are relative to the count of trees in that bin, to cater to persons with colour vision deficiencies, which is almost 10% of the population. The color is from lightest to darkest in descending order, to highlight the small points—the outliers—against the light background.

The literature fails to include visuals of diameter distribution by neighbourhoods in Vancouver. This chart addresses this and can also be filtered by height id. The diameter bins range from 2 inches to 10 inches depending on the chosen height id. It is useful to zoom and pan by scrolling and dragging the chart after choosing a height id with the radio button. A mouse hover over the circle of interest displays the tree count. A double click on the white space reverts the zoom and pan. The size and the color of the circles are re-adjusted when choosing different radio buttons. This chart makes it easy to identify neighbourhoods by diameter and height id.

-

Diameters of less than 10 inches contain the circles of largest size. Most trees are below 50 inches and the tree count decreases in the majority of neighbourhoods as the diameter increases. Shaughnessy and Kitsilano have a large range of diameters. Renfrew-Collingwood and Kensington-Cedar Cottage are heavily dense with trees less than 10 inches in diameter.

-

We found some interesting information like the diameters and height ids by neighbourhood using interactivity in figure 2. The next figure answers which neighbourhood has the highest distribution of genera or the number of unique genera. This is a unique map as the visuals by Matasci et al. (2018) do not include neighbourhood names or genus count. Galle et al. (2018) discussed the biodiversity in Vancouver, but contains no visuals. I supplement the map with a bar chart to show the genus count from highest to lowest value—the map colours are difficult to distinguish for the mid-values of genus count. A mouse hover gives you a count of the genera in the neighourhood along with the name. The bar chart hover displays an exact count of the genus count.

Renfrew-Collingwood is the winner. West End has half the genus count, at 23 unique genera. Strathcona has the lowest.

-

The next two tasks are finding the neighbourhood with the largest median diameter and the one with the largest tree density. The two questions can be answered with one scatter plot with median diameter on the x-axis and tree count on the y-axis. A mouse hover shows the neighbourhood name. Another way to navigate this scatter plot is using figure 3A to click on the neighbourhood of interest or using figure 4 to click on the circle of intereest. The chosen neighbourhood (through figure 3A) is highlighted in the scatterplot with a black stroke, while the remaining circles are coloured with white strokes. The chosen circle in the scatterplot is highlighted on the map (in figure 3A), as the opacity of the remaining neighbourhoods decreases. A double click on the white area of the map or scatterplot removes the neighbourhood or circle selection. Finally, circles are also coloured by the genus count.

Dunbar-Southlands has the largest mean diameter and Renfrew-Collingwood has the highest tree count. This scatter plot gives me the idea that there may be a positive relationship between tree density and genus count, as the points get lighter from the top to the bottom of the y-axis. I plotted a scatterplot (that is not shown here) and there is indeed a strong positive relationship between the two variables. It appears that the City of Vancover diversifies its genus choices, hence they are choosing different genera as they plant more trees. This is excellent as the importance of tree biodiversity in cities is highlighted in the paper of Galle et al. (2021).

-

The above chart—figure 4—shows the mean diameter spanning from 7 to 14 inches—a 100% difference from the lowest to the highest point. My hypothesis is the genus, rather than neighbourhood, is the responsible factor. I plotted a faceted density plot below of the diameters of the top three genera of the dataset. There are no interactive components in this chart.

The three most recurring genera peak at different diameters. PRUNUS has a bimodal distribution peaking around five and 15 inches. ACER peaks around five inches and TILIA around 15 inches. There is no conclusive evidence here, as this is only a density visualization of three genera. It could be that genus diversity contributes to median diameter differences by neighbourhood, however, further analysis is needed.

Discussion

The figures answered all the questions posed in the introduction. There is a positive relationship between height and diameter as expected, which the literature supports. Shaughnessy and Kitsilano are the go-to neighbourhoods for a large range of diameters. It is worth noting that there is only one tree for most of the outliers—the trees above 50 inches in diameter. A study on street trees in New York State, USA, found less than three percent of street trees above 106.7 centimetres in diameter, which is 42 inches (Cowett et al, 2014). Renfrew-Collingwood has the highest unique genus count of 46 genera and highest tree count of 384 trees; it is an excellent choice for landscape architects and tree enthusiasts. Dunbar-Southlands has the largest mean diameter of 14 inches. A very interesting observation is noted in figure 4's discussion section: there is a positve correlation between tree diversity by genus and tree density. The most common species in Vancouver is PRUNUS SERRULATA at 463 out of the total of 5000 trees, which comes to no surprise as the photos of this tree flood social media every spring. The thickest tree in the dataset is in Kitsilano and is CEDRUS DEODORA.

Three out of the four visualizations in the dashboard incorporate neighbourhood names, which answers a bulk of the posed questions. I expected downtown to have the least number of trees being a city centre, but it has over 150 trees. In a future paper, it would be excellent to tie tree age with diameter and height and also find out where the oldest street tree in Vancouver is. Also, a deeper analysis on figure 5 to reach a more conclusive answer on whether diversity by genus is the variable affecting the median distrubtion by neighbourhood. The dashboard below brings four visualizations together and has many interactive components to answer a large variety of questions. A year planted slider can be added in a future version of this paper, to show how it interacts with the plots.

The strengths of this paper are the large sample size and the uniqueness of the questions answered. A literature search shows the information communicated has not been visualized or discussed in depth in other papers. A limitation is the subset data. The City of Vanouver updates the variables on weekdays, however, it is possible that this subset was created more than 60 days ago, leaving some values outdated.

Dashboard

The interactivity instructions of each chart is in the Analysis section above each figure. Scroll up to the appropriate figure based on the figure number and revise any missed instructions. This dashboard answers all the questions posed at the beginning of the paper. Figure 3A and 4—the top two plots—interact with one another. Figure 2—the bottom left—interacts with the radio buttons of height id. Figure 1—the bottom right—interacts with the drop down menu of genus selection. Remember to use your mouse to hover over all charts and to use your mouse left-click button to select neighbourhoods in figure 3A and to select points in figure 3B. Finally, do not forget to zoom and pan on the bottom two charts, by using the scroller, and to left-click on individual points of interest in figure 1 for an engine search of the tree.

References

City of Vancouver. (2023, May 27). Street trees. - City of Vancouver Open Data Portal. https://opendata.vancouver.ca/explore/dataset/street-trees/information/?disjunctive.species_name&disjunctive.common_name&disjunctive.height_range_id&disjunctive.on_street&disjunctive.neighbourhood_name

Cowett, F. D., & Bassuk, N. L. (2014). Statewide assessment of street trees in New York State, USA. Urban Forestry & Urban Greening, 13(2), 213–220. https://doi.org/10.1016/j.ufug.2014.02.001

Galle, N. J., Halpern, D., Nitoslawski, S., Duarte, F., Ratti, C., & Pilla, F. (2021). Mapping the diversity of street tree inventories across eight cities internationally using open data. Urban Forestry & Urban Greening, 61, 127099. https://doi.org/10.1016/j.ufug.2021.127099

Matasci, G., Coops, N. C., Williams, D. A., & Page, N. (2018). Mapping tree canopies in urban environments using airborne laser scanning (ALS): A vancouver case study. Forest Ecosystems, 5(1). https://doi.org/10.1186/s40663-018-0146-y

University of British Columbia. (n.d.). Data Visualization · course 3 of UBC’s key capabilities in Data Science Program. Data Visualization. https://viz-learn.mds.ubc.ca/en